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(N . Abstract 



In this paper, we consider the revealed preferences problem from a learning perspective. Every day, 
a price vector and a budget is drawn from an unknown distribution, and a rational agent buys his most 
preferred bundle according to some unknown utility function, subject to the given prices and budget 
constraint. We wish not only to find a utility function which rationalizes a finite set of observations, but 
to produce a hypothesis valuation function which accurately predicts the behavior of the agent in the 
future. We give efficient algorithms with polynomial sample-complexity for agents with linear valuation 
f-H 1 functions, as well as for agents with linearly separable, concave valuation functions with bounded second 

. derivative. 



1 Introduction 



■ Consider the problem of a market-researcher attempting to divine the preferences of a population of con- 

sumers merely by observing their past buying behavior. Suppose, for example, that the researcher may 
observe a consumer each day: every day, the consumer is faced with the choice to buy some subset of 
goods, each of which may have a different price. The consumer is facing an optimization problem - each 
day he attempts to buy the subset of goods that maximizes his utility function, given his budget constraints. 
The market-researcher, on the other hand, is facing a learning problem. Based on his observations of the 
consumer, he would like to learn a model for the agent's utility function that can explain his behavior, and 
that can be used to predict (and therefore optimally exploit) his future behavior. 

This is the "revealed preferences" problem, and it has received a great deal of attention in the economics 
literature (see, e.g., MVar061 for a nice survey). Typically, however, the work on the revealed preferences 
problem has focused on determining whether a set of observations is rationalizable or not - i.e. whether 
it is consistent with any utility function that is monotone increasing in each good. A classic result in this 
literature is Afriat's Theorem, which roughly states that any finite set of observations is rationalizable if and 
only if it is rationalizable by a monotone increasing, piecewise linear, concave utility function. 

Note, however, that the problem of rationalizing is easier than the problem of learning. To rationalize a 
set of observations, it is sufficient to find a utility function which explains past behavior. Learning, however, 
requires finding a utility function which not only explains past behavior, but also will be predictive of future 
behavior! In particular, Afriat's theorem can be taken as showing that attempting to learn from the set of all 
monotone increasing, piecewise linear, concave utility functions is as hard (and as hopeless) as learning from 
the set of all utility functions. Indeed, Beigman and Vohra IBV061 have shown that this class of functions has 
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infinite fat-shattering dimension, and so without further restricting the set of allowable utility functions, no 
accurate predictions can in general be made after any finite set of observations, even by inefficient learning 
algorithms ! 

In this paper, we initiate the study of efficiently (in terms of both computational complexity and sample 
complexity) learning utility functions which can accurately predict future purchases of a utility-maximizing 
agent, given access to past purchase behavior. We necessarily restrict the class of agent utility functions, 
and consider both linear utility functions, and linearly separable concave utility functions with bounded 
2nd derivative. We give polynomial upper and lower bounds on the sample complexity (i.e. the number 
of observations) required for learning, as well as efficient algorithms that can learn predictive models from 
polynomially many observations. 

1.1 Our Results 

We consider a model in which an agent has an unknown utility function over a set of n divisible goods. We 
get to observe the behavior of the agent, who every day faces a set of prices for each good, together with 
a budget constraint, which is drawn from a fixed but unknown probability distribution. The agent selects a 
bundle of goods to buy so as to maximize his utility function subject to his budget constraint, and the goal 
of a learning algorithm is to impute a model for his utility function that correctly predicts his behavior with 
high probability on future price/budget instances drawn from the same distribution. 

We consider both linear utility functions, and then more generally, linearly separable concave utility 
functions with bounded derivatives. For both of these cases, we give efficient learning algorithms with poly- 
nomially bounded sample complexity. We then consider a relaxed model in which our algorithm receives 
expanded feedback from the agent during the learning stage, and is permitted to predict bundles that are 
within a small additive error of the agent's optimal bundle. In this relaxed model, we give a polynomial time 
learning algorithm with improved sample complexity bounds. 

1.2 Related Work 

Work on the "revealed preferences problem" has a long history in economics, beginning with the seminal 
work of Samuelson |Sam38]. Modern work on revealed preferences, in which explanatory utility func- 
tions are constructively generated from finitely many agent price/purchase observations began with Afriat 
[Afr65, Afr67 ] who showed (via an algorithmic construction) that any finite sequence of observations is 
rationalizable if and only if it is rationalizable by a piecewise linear, monotone, concave utility function. We 
will not attempt to review the extremely large body of work on revealed preferences, and instead refer the 
reader to an excellent survey of Varian [Var06]. 

Algorithms that constructively generate utility functions given a finite set of observations can be viewed 
as learning algorithms for the set of all monotone increasing utility functions. These algorithms typically 
come with a caveat, however, that the hypothesis utility functions they generate have the same descrip- 
tion length as the set of observations that they were generated from, and so tend to overfit the data - this 
observation is related to a recent paper of Echenique, Golovin, and Wierman [EGW1 1], who gave a thought- 
provoking result: that any set of rationalizable observations can in fact be rationalized by a utility function 
which is computationally easy to optimize. However, such a utility function clearly cannot be predictive of 
the future behavior of an agent who is in fact making his decisions based on an intractable utility function, 
because the hypothesis produced by the learning algorithm would itself be witness to the existence of a 
polynomially sized circuit for optimizing the purportedly intractable utility function of the agent. 
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Most related to our work is the work of Beigman and Vohra [BV061I who first pose the revealed pref- 
erences problem in the model of computational learning theory, with a distribution over observations and 
the explicit goal of producing a predictive hypothesis. They show that the set of all monotone utility func- 
tions has infinite fat-shattering dimension, and therefore prove that (without restricting the class of allowable 
utility functions), there does not exist any algorithm (independent of computational efficiency) which can 
provide any non-trivial predictive guarantees from any finite number of samples, over every distribution over 
observations. They also show that if the agent utility functions satisfy a certain bounded-jump condition, 
then the resulting class in fact has finite fat-shattering dimension, and that predictive learning is therefore 
possible using a finite number of samples. We continue this line of work by considering specific, simple 
classes of utility functions, and give efficient learning algorithms together with small polynomial upper and 
lower bounds on the sample complexity necessary for learning. 

A very nice recent line of work by Balcan and Harvey, and Balcan et al. BBH111 IBCIW121 considers 
a related problem of learning valuation functions. This is similar in motivation, but is orthogonal to the 
revealed preference setting considered here because it uses direct access to the valuation function evaluated 
on bundles, rather than only the "revealed" preference of the user, which is the maximum value bundle 
selected subject to some cost constraint. 

2 Preliminaries 

We consider the revealed preferences problem for an agent who when faced with a set of prices over n goods 
[n] buys the most valued bundle available to him. A bundle of goods is a vector of quantities x G [0, l] n , 
one for each good: Xi represents the fraction of the good i that is in the bundle. The goods are divisible: i.e. 
bundles can be arbitrary real valued vectors x G [0, l] n . 

The agent has a value function v : [0, l] n — > R. His value for a bundle x G [0, l] n is simply v(x). Goods 
can also be paired with vectors of non-negative prices p G R™ , where pi is the price for good i. The price 
of a bundle is linear in the goods in the bundle. The price of a bundle x with respect to prices p is therefore 
simply x ■ p. Prices are important, because the agent may be faced with a budget constraint B: he can only 
buy bundles x such that x ■ p < B. 

The agent is a utility maximizer. When faced with a price vector p and a budget B, he will choose to 
buy the bundle that maximizes his value subject to his budget constraint: That is, he will choose the bundle: 

x*(v,p,B)= argmax v(x) 

x£[0,l} n :x-p<B 

We will consider several types of value functions in this paper. A linear value function v is defined by a 
vector v G W] , where Vi is the marginal value of good i. In this case, v{x) = v ■ x. More generally, we can 
consider linearly separable concave utility functions. A value function v is linearly separable and concave 
if it can be described using concave functions v\,...,v n where each : [0, 1] — > R + is a one-dimensional 
real valued function, and we can evaluate v (x) = Y17=i v i( x i)- 

The revealed preferences problem is to recover a value function that can explain a sequence of choices 
that the agent was observed to make. In this paper, we wish to recover a value function that can not only 
rationalize observed behavior, but can help predict future behavior. In order for this to be a meaningful task, 
we must assume that the choices presented to the agent are drawn from some distribution. 

Definition 2.1. An example is a price vector p G R™ paired with a budget B G R+. A distribution over 
examples V is simply a distribution over (p, B) ~ [0, l] n x R + . 
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Definition 2.2. An observation of an agent with value function v, (p, B, x*(p, B, v)) E WL x R + x MJ} 
is simply a triple consisting of a price vector p, a budget B, and a bundle x*(p, B, v) chosen by the agent 
given p and B: i.e. a bundle x that maximizes v{x) subject to x ■ p < B. 

Definition 2.3. An algorithm A <5-learns a class of value functions V from m = m{5) observations if for 
every distribution V over examples and for every value function v € V, given a set of m observations 
{(pi, Bi, x*(pi, B{, v ))}^ 1 where examples (pi,Bi) are drawn i.i.d. from V, with probability 1 — 5 it 
produces a hypothesis £ such that: 

Pr [v(x*(p,B,v)) = v{x*{p,B,v))} > 1 - 5. 

(p,B)~X> 

We say that A is efficient if both its run-time and its sample complexity m{5) are bounded by some polyno- 
mial p(n, 1/6). We say that the sample complexity of learning V is at most m* = m*{5) if there is some 
algorithm A which <5-learns V from m{5) < m*(5) observations. 

Remark 2.4. Note that a learning algorithm must with high probability (over the choice of observations and 
coins of the mechanism) produce a value function which most of the time (over draws of examples) selects 
a bundle which is equal to the bundle that the agent would have selected. 

In section [5] we relax our definition of learning to allow our learning algorithm to predict bundles which 
are only approximately optimal to the agent, rather than requiring that it select the exactly correct bundle. 
Note that such approximately optimal bundles might look very different from exactly optimal bundles, and 
so we will also need to allow our learning algorithms to receive richer feedback from the agent. 

Definition 2.5. An algorithm A (e, <5)-learns a class of value functions V from m = m{5) observations if 
for every distribution V over examples and for every value function v € V, given a set of m observations 
{(pi, Bi, x*(pi, Bi, v))}^ where examples (pi,Bi) are drawn i.i.d. from V, with probability 1 — 5 it 
produces a hypothesis v such that: 

Pr [v(x*(p,B,v)) > v(x*(p,B,v)) - el > 1 - 5. 

(p,B)~© 

For this notion of additive approximation to be meaningful, we will typically normalize the target utility 
function v to lie in the range [0, 1]. 

3 All Pairs Comparisons Algorithm: Learning Linear Valuation Functions 

In this section, we present an algorithm that efficiently <5-learns the class of all linear valuation functions 
given a set of m = 0{ n ln ^ n ^ ) observations. In particular, this provides a quadratic upper bound on the 
optimal sample complexity m*{5) for learning linear valuation functions. We note that a linear Q(m) lower 
bound is immediate in this setting. We start by characterizing the optimal bundle for an agent maximizing a 
linear utility function, and give intuition for our learning algorithm. 

Let v* and p denote some fixed value and price vectors respectively, and let B denote some fixed budget. 
We denote the optimal bundle (according to the linear utility function defined by value vector v*, price vector 
p, and budget B) by x*. Recall that the value of the optimal bundle is v* ■ x* , and its cost p ■ x* is at most 
the budget B. Observe that in choosing bundle x* , the agent is solving a divisible knapsack problem, and 
so the following structural lemma is immediate. 



4 



Lemma 3.1. For any pair of goods [n] with x* > x*, it must be that: 

V 1> V 1 
Pi Pj 

Equivalently, for any pair of goods with % > —, the optimal bundle "prefers" good i over good j (It 
will never buy any of good j until it has exhausted the supply of good i). Our algorithm is based on this 
structural characterization, and operates by maintaining upper and lower bounds on each of the n 2 ratios % 

3 

for i ^ j € [n]. Based on this transitive relation, we can sort the goods, and find the optimal bundle by 
buying the goods one by one starting from high priority goods until the budget B is spent completely. In this 
optimal bundle, we have at most one fractional item. In our algorithm, we try to learn ratios ^ accurately 
for all pair of goods with high probability. 

AUPairsLearn((5) : 
Training Phase: 

1. Let E be a set of m = O ( - ln f 2 / 5 ^ observations (p,B,x*(p,B,v)). 

2. Initialize bounds (Ljj, Uij) for each i / j e [n]. Initially L{ j = and Uij = oo for all i, j. 

3. For each {p,B,x*) G E: 

(a) For each i ^ j G [n] : 

i. If x* > x*, Let Lij = max(Ljj, f-) 

ii. If x* > x*, Let U i:j = mm(U i:j , f-) 

Classification Phase: 

1. On a new example (p, B) let v € [0, l] n be any vector such that for all i ^ j G [n] 4 € Uij]. 

Predict bundle x'(p, B, v') that results from maximizing v' with respect to prices p and budget con- 
straint B. 

Figure 1 : The All Pairs Comparison Algorithm for Learning Linear Valuation Functions. It takes as input 
an accuracy parameter 5. 

The intuition is that in order to find the optimal bundle x*, we need only know bounds on the ratios of 
the values of pairs of goods for which unequal quantities are purchased in the optimal bundle. So if we know 
that ^ > 2i for any pair of goods with x* > x*, we can find the optimal bundle x*. We need not know the 
values themselves - it is sufficient to bound these ratios. For example, if the lower bound Ljj is at least 
we can infer that good i is preferred to good j. If we can infer all these preferences for pairs of goods (i, j) 
with x\ ^ x*j, we can find the optimal bundle as well. Following we show that with high probability after 
observing m = 0(n 2 ln(n 2 /S)/5) i.i.d. examples we can find the optimal bundle. 

Theorem 3.2. AllPairsLearn{5) efficiently 5 -learns the class of linear valuation functions given m = 
O ( n ln ^ w ^ ) observations. 
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Proof. For each pair of goods (i, j), we define ay and by as follows: 

■ / i ^ Vi s □ ( * ^ * c ^ r Vi A ^ 5 1 
aj ? - = mm < a\a < — & -Pr x 4 > Xa & — € [a, — < — ^ > 

{ Vj V Pi % y J 

ba = max(&|&>- k Pr(x*>x* & ^ G [-, b]^ < 41 
I V Pi 'V / J 

where p is the price vector drawn from the distribution V, and x* is its optimal bundle. Every time an 
i.i.d. example is drawn, with probability 5/n 2 , the lower bound Ly becomes at least ay, and the upper 
bound Uij becomes at most fry for every pair For each pair (i, j) after m observations, Ly is less 

than ay with probability at most (1 — 5/n 2 ) m < e~ ln ( n < 5/n 2 . A similar argument holds for Uij. 
Using union bound, we can have that with probability 1 — 5, every Lij is at least a^j, and every Uij is at 
most bij. 

Now when a new example (p', B', x'(p', B', v)) arrives (x' is the optimal bundle), the probability that 
x ■ ^ x'j and we can not imply which of these two items are preferred over the other one, i.e. ^ G [Ly , f/y] 

is at most 26/n 2 , because we know that [Lij, JJjj] C [ajj, Using union bound, with probability 1 — 5 
we can derive all preference relations for items with unequal fractions in the optimal bundle x' . In the other 
words, with probability 1 — 5, we can find the optimal bundle x'. 

□ 



4 Learning Linearly Separable Concave Utility Function 

In this section, we modify the algorithm presented in section [3] to learn the class of linearly separable 
concave utility functions. Recall that agents with linearly separable utility functions have a separate function 
Vi : [0, 1] — > R+ for each 1 < i < n, and their utility for bundle x is X^"=i v i( x i)- We assume that 
each utility function is a concave function with bounded second derivative. Concavity corresponds to a 
decreasing marginal utility condition: that buying an additional e fraction of item i increases agent utility 
more when we have less of item i: Vi(a + e) — t>j(a) > Vi(b + e) — Vi(b) for any a < b. Our bounded second 
derivative assumption states that the second derivative of each utility function has some supremum strictly 
less than oo. 

We first characterize optimal bundles, and then adapt our learning algorithm for linear valuation func- 
tions to apply to the class of linearly separable concave utility functions. 

Fix a utility function v* = {v* : [0, 1] — > M + } and a price/budget pair (p,B). The corresponding 
optimal bundle can be characterized as follows. For any threshold r > 0, define x\ to be Max{f\f G 

[0, l]&i^p- > t} where v [(f) is the first derivative of function V{ at point /. We can now define p T to be 
Y^i=iPi x I- We will show that the optimal bundle x* for v* in the face of price/budget pair (p,B) is the 
vector such that x* = xj for each 1 < i < n where r is the maximum value such that this bundle does not 
exceed the budget constraint. The following lemma is proved in Appendix lAl 

Lemma 4.1. The optimal bundle x* for pair (p, B) is equal to x T where r is Max{r\p T < B}. 

The intuition for our algorithm now follows from the linear utility case. From each observation con- 
sisting of an example and its optimal bundle, we may infer some constraints on the derivatives of utility 
functions at various points. Just as in the linear utility case, these are the only pieces of information we need 
to infer the optimal bundle. 
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LinearSeparableLearn(e, 5): 
Training Phase: 

1. Let Ebea set of m = O f (^+^)) 2 H(n(k+2))ys) \ observations fa B x *fa ^ 

2. Initialize bounds (L(i,r,j,s),U(i,r,j,s)) for each i ^ j G [n] and r, s G [fc] defined in Defini- 
tion 14.21 Initially L(i, r,j, s) = and C/(i, r, j, s) = oo. 

3. For each (p,B,x*) G £: 

(a) For each f 7^ j G [n] : 

i. If x? > x|,LetL(i, LfcCiJ.J. \ kx j~\) = max(L(i, L^*j,j, [A;x*l),|) 

ii. If x* > x*, Let U(i, \kx*],j, [kx*\) = mm(U(i, \kx*],j, [kx*]), g) 

Classification Phase: 

1. On a new example (p, B) find thresholds {li}™ =1 such that v , > ^ for each pair i, j G [n], 

and ^ ftM^ft,D} < B < J27=i PiMin f +1 ' k \ Buy li/jfe fraction of object i for every i G [n], 
and spend the remaining budget to buy equal fraction of all objects. 

Figure 2: The Learning Algorithm for Linearly Separable Valuation Functions. It takes as input an accuracy 
parameter 5, and an error parameter e. 



Unlike the linear utility setting, however, it is not possible to maintain bounds on all ratios of derivatives 
of utility functions at all relevant points, because there are a continuum of points and the derivatives may 
take a distinct value at each point. Instead, we discretize the interval [0, 1] with k+1 equally distanced points 
0, 1/k, 2/k, ■ ■ ■ , 1 for some positive integer value of k, and maintain bounds on the ratios of the derivatives 
at these points. 

Definition 4.2. We let k to be an integer at least [(2Q/e) • max( p t B)~v ,i<j<n{J~}] where Q is an upper 
bound on v"(x) over all i and x G [0, 1], and e is the error with which we are happy learning to. We define 
V(i, I) = v'^l/k) for item i, 1 < i < n and discretization step I, < I < k. For convenience, we define 
V(i, k + 1) = 0. For any pairs 1 < i,j < n, and < r, s < I, we define L(i, r,j, s) and U(i, r,j, s) to 
be the lower and upper bounds on the ratio y fe^j . The lower and upper bounds are intialized to zero and 00 
respectively. 

Analogously to the linear case, our algorithm will maintain upper and lower bounds on the pairwise 
ratios between each of these these n(k + 2) variables. Since the utilities are concave, we will also maintain 
the constraint that V(i, I) < V(i,l — 1) for any 1 < % < n and 1 < I < k + 1 throughout the course of the 
algorithm. 

In the training phase, the algorithm selects m = 0((n(k + 2)) 2 log((n(k + 2)) 2 /e))/<5) observations. 
Note the similarity in the number of examples here as compared to the linear case: this is no coincidence. 
Instead of maintaining bounds on the pairwise ratios of n derivatives we are maintaining bounds on the 
pairwise ratios between n(k + 2) derivatives. 

Consider the inequalities we can infer from each observation (p,B,x*). By our optimality characteri- 
zation, we know that for any pair of items i and j with x* > and x* < 1, we must have: Vi ^ i - > ^ ■ 
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We therefore can obtain the following inequality: 

vis LKJ) >vjMl > v j^j1> YHi 

Pi Pi Pj Pj 

The above inequality defines the update step that we can impose on the lower bound L(i, I' , j, I") and 
upper bound U(i,l' on the ratios y^'i^ where I' = [kx*\, and I" = \kx*~\ , analogously to our 
algorithms update for the linear case. For each example, we update these bounds appropriately. 

After the training phase completes, our algorithm uses these bounds to predict a bundle for a new ex- 
ample (p,B). The algorithm attempts to find some threshold — 1 < /j < k for each item i such that the 
following two properties hold. We define V(i, — 1) = oo for each 1 < i < n. 

• For each pair of items i ^ j G [n], upper and lower bounds imply that V<K% J^ > V ^ ,l p 3+1 ^ . 

. We have that: YJl=i nMax k {kfi} <B< £"=i ^ Min f +1 ' k \ In other words, there is enough budget 
to buy maxjij, 0}/k fraction of object i for all 1 < i < n, and the total cost of buying min{/i+l, k}/k 
fraction of each item i is at least B. 

After finding these thresholds l\, I2, ■ ■ ■ ,l n , our algorithm selects a bundle that contains max-{7j, 0}/k 
units of item i for each i, and then spend the rest of the budget (if there is any remaining) to buy an equal 
fraction of all objects with < U < k, i.e. if B' of the budget remains after the first step, we buy 
^ units of each object i with < U < k. Intuitively, the objects with k = 0, represent 

AKK11, 0<l i <kPi 

very expensive objects (in comparison to their values) which we prefer not to buy at all. On the other hand, 
we have already exhausted the supply of objects with = 1. 

In the rest of this section, we show in Lemma R31 (proved in Appendix lAl how to find these thresholds 
(the sequence Zj for 1 < i < n) based on the learned upper and lower bounds on ratios if such thresholds 
exist. Then, we prove in Lemma33]that after training on m examples, with high probability (at least 1 — 25), 
this sequence of thresholds indeed exists. Finally we conclude that our algorithm is an (e, <5)-learner. 

Lemma 4.3. Assuming there exists a sequence of thresholds {/i}™ =1 with the two desired properties in our 
algorithm, there exists a polynomial time algorithm to find them. 

We now prove that the required sequence of thresholds {li}f =1 exist with high probability. The proof is 
very similar to Lemma [3721 and included in Appendix lAl 

Lemma 4.4. After updating the algorithm's upper and lower bounds using m = 0((n(k + 2)) 2 log((n(k + 
2)) 2 /5)/5) observations, when considering a new example (p, B), the sequence of thresholds {li}f =1 exists 
with probability at least 1 — 26. 

To conclude, we just need to show that if we find the thresholds with the desired properties, the returned 
bundle is a good approximation of the optimum bundle. The proof can be found in Appendix lAl 

Theorem 4.5. For any e > 0, we can find some k ( the discretization factor) such that with probability at 
least 1 — 25 over the choice of example (p, B), the bundle x = x(p, B) returned by our mechanism admits 
at least one of the following properties: 

1. For each item 1 < i < n, we have that Xi > x* — e, 

2. v*(x) > v*(x*) - e 

In other words, have that our mechanism is an efficient (e, 5) -learning algorithm for the class of linearly 
separable concave utility functions with bounded range v : [0, l] n — > [0, 1]. 
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5 A Learning Algorithm based on Sampling from a Convex Polytope 



In this section, we present another learning algorithm for (e, <5)-learning linear cost functions. We introduce 
a new model, that gets a stronger form of feedback from the agent, and as a result achieve an improved 
sample complexity bound that requires only m = O ^ BP2l^lM ^ observations. 

During the training phase of our algorithm, it will interact with the agent by adding constraints to a 
linear program and given a new example, propose a candidate bundle to the agent. The agent will either 
accept the candidate bundle (if it is approximately optimal), or else return to the algorithm a set of linear 
constraints witnessing the suboptimality of the proposed bundle. The main idea is that for each new example 
either our algorithm's bundle is almost optimal, or we receive a set of linear constraints to add to our linear 
program that substantially reduce the volume of the feasible polytope. If the set of constraints are restrictive 
enough, with high probability, we achieve an approximately optimum bundle on all new examples, and 
we can end the training phase. Otherwise each new example cuts off some constant fraction of the linear 
program polytope with high probability. After feeding a polynomial number of examples, and using some 
arguments to upper bound the volume of the polytope at the beginning and lower bound its volume at the 
end, we can prove with high probability, the algorithm finds an almost optimal bundle for future examples. 
First we explain the model, and then we present our algorithm. 

Model: We consider agents with linear utility functions, here bounded so that v G [0, l] n . If we have 
that v-x > v-x — e,sN& say bundle x is an e-additive approximation to the optimal bundle x* = x* (v* ,p,B), 
and it will be accepted by the agent if it is proposed. If a proposed bundle x is not e-approximately optimal, 
the agent rejects the bundle if proposed, and instead returns a set of inequalities which are witness to the 
sub-optimality of our solution. The agent returns all valid inequalities of the following form for different 
pairs of objects i,j G [n]: Vi ~ e > -y— where e' = e/nM, and M is the maximum ratio of two different 
prices in the domain of the price distribution (V). 

Intuitively, for these pairs we have that ^ is greater than ^ by some non-negligible margin. In the 
following, we show that for any suboptimal bundle (not an e-additive approximation) resulted from a value 
vector v, there exists at least one of these inequalities for which we have that ^ < In other words, 

Pi Pj 

these set of inequalities that our algorithm returns could be seen as some evidence of suboptimality for any 
suboptimal bundle for example (p, B). 

Lemma 5.1. For any pair of price vector and budget (p, B), and a suboptimal sampled value vector v ( that 
does not generate an e-approximately optimal bundle x), there exists at least one pair of items such 
that we have > and & < |i. 

Pi Pj Pi Pj 

Proof. Let x* and x be the optimal bundle and the returned bundle based on v respectively. We note that 
since all objects have non-negative values, we have that x* -p = x -p = B unless the budget B is enough to 
buy all objects in which case both x* and x are equal to (1, 1, • • • , 1) which is a contradiction because we 
assumed x is suboptimal. 

We can exchange v/pi units of object i with v/pj units of item j and vice versa without violating the 
budget constraint. We show that all the differences in entries of x* and x can be seen as the sum of at most n 
of these simple exchanges between pairs of objects as follows. We take two entries % and j such that x* > x\ 
and x* < Xj. We note that as long as two vectors x* and x are not the same, we can find such a pair because 
we also have that v ■ p = v ■ p. Without loss of generality, assume that (x* — Xi)pi < (xj — x*)pj. Now we 
buy x* — X{ more units of item i in bundle x to make the two entries associated with object i in bundles x* 
and x equal. Instead we buy (x* — Xi)pi/pj fewer units of object j to obey the budget limit B. This way, 
we decrease the number of different entries in x* and x, so after at most n exchanges we make x equal to 
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x* . By assumption, v*(x) < v*(x*) — e. Therefore, in at least one of these exchanges, the value of x is 
increased by more than e/n. 

Assume this increase happened in exchange of objects i and j. Let r be (x* — xi)pi. We bought r/pi 
more units of i, and r/pj fewer units of j. The increase in value is r(vi/pi — Vj/pj) = (x* — Xi)(vi — 
v jPi/Pj) > e / n - Since x* — xi is at most 1, we also have that Vi — VjPi/pj > e/n which can be rewritten 
as: V{ — e/2n > VjPi/pj + e/2n. This is equivalent to Vl ~ ( J 2n > t 'j + (' ; / 2 ^-)(Pj/P') _ \y e can conclude that 

Pi Pj ^ Pz Pj 

We also note that ii < 1 and x, > 0, so we can infer that ^ < . Otherwise one could exchange some 

p* Pj 

fraction of j with some fraction of i and gain more value with respect to value vector v. This completes the 
proof of both inequalities claimed in this lemma. □ 

Algorithm: We maintain a linear program with n variables representing a hypothesis value vector v. 
Since v is in [0, 1]™, we initially have the constraints: < V{ < 1 for any 1 < i < n. At any given time, our 
set of constraints forms a convex body K. 

Our algorithm loops until we reach a desired property. At each step of the loop we sample ^"sW^gf 1 /^ 
examples, and for each of them we sample uniformly at random a vector v from the convex body K, and 
predict the optimal bundle based on this sampled vector. (Note that uniform sampling from a convex body 
can be done in polynomial time by HDFK91I0 . At the end of the loop, we add the linear constraints that we 
obtained as feedback from the agent to our linear program, and get a more restricted version of K which we 
call K' . 

If the volume of K' is greater than 1 — 5 times the volume of K, we stop the learning algorithm, and 
return K as the candidate convex body. Otherwise, we replace K with the new more constrained body K', 
and repeat the same loop again. To avoid confusion, we name the final returned convex body K. After the 
training phase ends, for future examples, our algorithm samples a value vector v uniformly at random from 
this convex body K, and predicts the optimal bundle based on v. We explain what kinds of constraints we 
add at the end of each loop to find K'. 

Each iteration of the training phase uses Cl ° s ( n ^ ^si 1 / 6 ) examples. Recall that for each one, the mech- 
anism proposes a bundle to the agent, who either accepts or rejects it. For each rejected bundle, we are 
given a set of pairs of objects (i, j) such that > Vj ^ e . For each inequality like this, we add the looser 
constraint ^ > At the end, we have a more restricted convex body K' which is formed by adding all of 
these constraints to K. 

We must show that after the training phase of the algorithm terminates, we are left with a hypothesis 
which succeeds at predicting valuable bundles with high probability. We must also also bound the number 
of iterations (and therefore the number of examples used by the algorithm) before the training phase of the 
algorithm terminates. First we bound the total number of iterations of the training phase. 

Lemma 5.2. The total number of examples sampled by our algorithm is at most 



m 



Q ^ ralog(n)(log(ra) + log(M)) log(l/e) log(l/<S) ^ 



Finally, we argue that after the learning phase terminates, the algorithm returns a good hypothesis. 
Theorem 5.3. The algorithm (e, 5) -learns from the set of linear utility functions. 
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Proof. Given a new example (p,B), the algorithm samples a value vector v uniformly at random from the 
convex body K, and returns an optimal bundle with respect to v, p, and B. 

Consider a price vector p and budget B. For some value vectors in K, the returned bundle is suboptimal 
(not an e-additive approximation). We call this subset the set of suboptimal value vectors with respect to 
(p, B), and the fraction of suboptimal value vectors in K is the probability that our algorithm does not return 
a good bundle, i.e. the error probability of our algorithm. We say a pair (p, B) is unlucky if for more than 
5 fraction of value vectors in K, the returned bundle is suboptimal. We prove that with probability at least 
1 — 5/2, the convex body K we return, has this property that with at most probability 5/2, the pair (p, B) 
drawn from V is unlucky. This way with probability at most 6/2 + 5/2 = 5, the pair (p, B) is unlucky 
which proves that our algorithm is (e, <5)-learner. 

We prove the claim by contradiction. Define A to be the event that "with probability more than 5/2, the 
pair (p, B) ~ V is unlucky". We prove that the probability of event A is at most 5/2. Let Ki be the convex 
body at the beginning of iteration i, and K- be the more restricted version of Ki that we compute at the end 
of iteration i. Event A holds if for some i we have these two properties: a) the probability that a pair (p,B) 
drawn i.i.d. from V is unlucky with respect to Ki is more than 5/2, i.e. if we sample the value vector from 
Ki, the returned bundle for (p, B) is suboptimal with probability more than 5. b) the volume of K[ is not 
less than 1 — 5 times volume of Ki. 

We bound the probability of having both of these properties at iteration i. In this iteration, for every 
example we take, with probability more than 5/2, the pair (p, B) is unlucky. For an unlucky pair (p, B), 
with probability more than 5, we return a suboptimal example, and then we get feedback from the agent. 
Using lemma l5?Tl and the feedback we get from the agent, all of the suboptimal value vectors for pair (p, B) 
will be removed from Ki and will not exist in K[ (by the new constraints we add in this loop). Since (p, B) 
is unlucky, more than 5 fraction of the Ki will be deleted in this case. In other words, for each example 
in loop i with probability at least 5 2 /2, more than 5 fraction of Ki will be removed. Clearly, since K- has 
volume at least 1 — 5 fraction of Ki, this has not happened for any of the examples of loop i. Since we 

have c ' lo s( TO ) : | s( 1 /^) examples in each loop, the probability of holding both these properties at loop i is at 

ciogMi.g(i/i) 

most (1 — 5 ) s 2 < 5/{2n ) for 5 < 1/2. Since there are less than n u number of loops for some 
large enough constant C, the probability of event A (which might happen in any of the loops) is less than 
5/2. □ 



6 Discussion 

In this paper we have considered the problem of efficiently learning predictive classifiers from revealed 
preferences. We feel that the revealed preferences problem is much more meaningful when the observed 
data must be rationalized with a predictive hypothesis, and of course much remains to be done in this study. 
Our work leaves many open questions: 

1. What are tight bounds on the sample complexity for <5-learning linear valuation functions? There is 
a simple O(n) lower bound, and here we give an algorithm with sample complexity 0(n 2 /5), but 
where does the truth lie? 

2. Is there a general measure of sample complexity, akin to VC-dimension in the classical learning 
setting, that can be fruitfully applied to the revealed preferences problem? Beigman and Vohra [BV06 ] 
adapt the notion of fat-shattering dimension to this setting, but applied to the revealed preferences 
problem, fat shattering dimension is cumbersome and seems ill-suited to proving tight polynomial 
bounds. 
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A Omitted Proofs 

Proof of Lemma WT\ For each possible pair of items i / j G [n], we consider three cases: 

• < xf, x*z < 1: In this case, v ^ Xl - = Vl * 3 ■ Otherwise (if for example the expression corresponding 

4 J Pi Pj "~ 

to item i is greater), we could buy e'/pi additional units of i, and buy e'/pj fewer units of j without 
violating the budget constraint. When e' — > 0, this exchange will be beneficial for the agent which 
would contradict optimality. 

• x* = 1 and x* < 1: In this case, v * ' - > ^4r^ otherwise the agent could buy fewer units of i and 

L J Pi Pj 

additional units of j and thereby increase the value of the bundle, contradicting optimality. 

• x* > and x* = 0: Identically to above: > ^-11. 

1 3 J Pi — Pj 

To complete the proof we need now only to select r. If there exists some 1 < % < n such that < 
x* < 1, setting r = 1 p i proves the claim. Otherwise, any value r £ [max i | a ,* =0 * , min i | a ,* =1 \ 
completes the proof. □ 



Proof of Lemma POl First let us assume that there is a sequence of thresholds with the desired properties. 
In this case, we may find it as follows. Suppose item i has the maximum value of Y^hktll among all items: 
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i.e. > YMh^l f or an y j w e assume that this item i and threshold k are given, because we 

can guess their values as there are n(k + 2) possible choices for them. For any item j ^ i, we select some 
I a such that it can be inferred from our upper and lower bounds that v ^' 1 ^ > Yihh±H but it can not be 
inferred that m±i±ll > YM±ll. 

Pj — Pi 

Since we have V(j, —1) = oo and V(j, k+1) = 0, we can always find some value for lj. In fact for each 
item j, we can find two thresholds < ti(j) < t2(j) < k such that a) we can infer that V<y ^ t p 1 — > v ^ l ^ +l ^> ; 

and b) we can also infer that V ^' t - = V ^'p +1 ^ for any t\(j) < t' < ^(i), and finally, c) we can not infer 
tna t v tite{i)+ l ) > v(i,k+i) ^ yariabig /. CO uld be any integer in range [t i , - We might sometimes have 

Pj pi J L J 

that = ^(i) which means that lj is uniquely defined. 

Assuming object i has the maximum value of ( we know that any solution /j € [ti (j) , ^0)] 

(for all j ^ i) satisfies the first property we are looking for. The second property is a budget constraint: we 
should be able to buy max{Zj, 0}/k units of each item j, and the total cost of buying min{lj + 1, k}/k of 
each item j should be at least B. 

We stait with thresholds lj = t\(j). If these are not feasible (i.e. if the resulting bundle costs more than 
B), there does not exist such a sequence of thresholds with object i as the object with maximum V ^' l p ' 
and as the threshold of object i. Alternately, if these thresholds are feasible, we increase the thresholds one 
at a time while the cost of the resulting optimal bundle remains below B. We have the freedom to increase 
threshold lj in the range t2(j)]> an d we can increase it one unit at a time to a maximum of 1 2 ( J ) - 

We stop when it is not possible to increase any of the thresholds any more. This process results in a set of 
thresholds lj € [ti(j), ^(j)], and it is not possible to increase any of them. 

If for some j, lj is strictly less than t2(j), we can infer that budget B is not enough to buy max{lj/, 0}/k 
units of each object f ^ j, and (lj+l)/k = Min{lj+1, k}/k units of object j (note that lj+l < *2 (j ) < k) 
( Otherwise we could have increased the threshold lj by at least one). Consequently for this sequence of 
thresholds, there is not enough budget to buy min{/j// + 1,0}/ A; units of item j" for all 1 < j" < n. 
Therefore, this sequence satisfies both properties we wanted. 

In the remaining case, we stop at lj = t2 {j) for all j ^ i. In this case, if cost of buying min{/j/ + 1 , k}/ k 
units of all items 1 < j' < n is at least B, this sequence of thresholds l\, I2, ■ ■ ■ ,l n satisfies both properties 
that we want. Otherwise, we must try another guess for object i and threshold li to start again. We try all 
n(k + 2) possible guesses exhaustively for pair (i, li), and if in one of them we succeed to find a sequence 
of valid thresholds, we are done, otherwise there does not exist such sequence, and our algorithm simply 
returns a random bundle. (The probability that this occurs will be folded into the error probability of our 
algorithm). □ 
Proof of Lemma \4~4\ Similar to Lemma [3721 we define aj,rj,s and 6j irjiS where i and j are two objects, and 
< r, s < k as follows: 

a iirijjS = min |a|Pr (r/k < x* < (r + l)/k & s/k < x* < (s + l)/k & a < g < < (n(fc ^, 2))2 

W = max \b\Pr [r/k < x* < (r + l)/k & s/k < x* < (s + l)/k k ||| < | < b]j < j^p^y 

Every time we see an example, with probability ( n ( fc ^ 2 )) 2 ' we u P^ ate me lower bound L(i, r, j, s + 1) to 
some thing equal to or greater than ai t r,j,s- A similar claim holds for the upper bounds and values of 6j, s ,j,r- 
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Similar to the proof of Lemma [3^21 one can show that with probability at most (1 — 5/(n(k + 2)) 2 )" 1 < 
5/{n{k + 2)) 2 the lower bound L(i,r,j,s + 1) is less than a?,,rj> after observing all m examples. Using 
the union bound this does not happen for any pairs of (i, r) and (J, s) with probability at least 1 — 5. So 
with high probability, we have very accurate bounds on the ratios of different first derivatives at points 
0,l/k,2/k,--- ,1. 

In the classification phase of the algorithm, consider a new example is drawn from V. We prove that 
with probability 1 — 5, we can find the thresholds {k}f =1 , if the lower bound Lj r j jS+ i is at least ai, r ,j,s for 
different choices of i, j, r, and s. So we assume these inequalities hold. Define Zj to be [kx^ J for 1 < i < n 
with x* > 0, if x* is zero, we define Zj to be —1. We know that L^jj.+i is at least di^jj-, so we can infer 

that > v{hl J +1) unless G \a irjs , 4t4t1 which occurs with probability at most S/(n(k + 2)) 2 . 

Pi Pj Pj ' V j\ X j' 

Taking a union bound again, and considering all choices of i, j, r, s, we find that this does not happen for 
any 4-tuple r, s) except with probability at most 5. 

Therefore we have shown that the thresholds {Zj}" =1 exist and can be inferred based on our upper and 
lower bounds with probability 1 — 25. Thus our lower bounds are enough to imply the the first property of 
the thresholds {Zj}™ =1 . We also note that the second property is satisfied because of the choices of Zj. Clearly 
max{Zj,0}/fc is at most x*, and therefore the total cost of buying max{/j,0}//c of each object i is at most 
the cost of optimum bundle, which is B. We also know that min{Zj + 1, k}/k is at least x* which gives us 
the remaining inequality needed for the second property of the thresholds. □ 
Proof of Theorem 14.51 Using Lemma l4~4l we know that for any example (p,B), our algorithm finds 
thresholds {^}f =1 with probability 1 — 25. We prove that our bundle has one of the two properties in the 
statement of this theorem in these cases (when our algorithm finds appropriate thresholds). There are two 
cases: 

• There exists some item i such that x* > (k + l)/k. We have that V ^' lj - > > Vi ^ i - for any 

j ^ i. Therefore x* > lj/k for each 1 < j < n. 

Therefore the proposed bundle is completely consistent with the optimum solution up to l# / k fraction 
for each item 1 < i' < n. We spend the rest of our budget to buy equal fractions of objects with 
< lyi < k, but the optimum algorithm might do something else. Based on the second property of 
thresholds, the remaining budget is not enough to buy more than 1/k fraction of these objects, so in 
the second step we buy some fraction p < l/k of all objects with < li» < k to spend our budget 
completely. 

To compare the performance of the optimum algorithm and our algorithm on the remaining budget, 
we will focus on some small part of the remaining budget. With very small budget e' > 0, we might 
buy some fraction of object j (with < lj < k) to increase its quantity by e'/pj. We know that 
since the value function for object j, Vj, is concave with bounded second derivative, our increase in 
value is at least (v'j(lj/k) — Q/k)e'/pj where Q is an upper bound on all values of second derivatives 
of value functions. Because the fraction of object j, when we are increasing it, lies in the range 
[lj/k,(lj + l)/k], and clearly has difference at most l/k from fraction lj/k. Therefore the first 
derivative of the value function of object j can not be less than v'j (lj/k)—Q/k, when we are increasing 
it. 

On the other hand, the optimum solution might use this e' budget to buy e'/pji fraction of object j'. 
Since the fraction object j' is in range [If, 1] when the optimum is buying from j', the increase in 
the value can not be more than {v ! -,{{lji + l)/k) + Q/k)e' jpy. This holds because the fraction of 
object / is at most l/k less than (Z,-/ + l)/k. This means that the optimum solution is gaining at 
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most (v'j,((lj> + l)/k) + Q/kj /pj/ — (v'j(lj/k) — Q/k)/pj more value per each unit of budget in 

comparison to our algorithm. Since we have that V ^' lj ^ > —^-^7—^, this term (the difference in 
values per unit of budget) is at most Q(l/pj + l/pj')/k. In order to make the total difference in the 
values of the two bundles at most e, one just needs to set k > [(2Q/e) ■ max Pi B~v,i<j<n J^"]- 

• In the second case, for all 1 < % < n, x* < (Zj + l)/k, and our fraction for this object is at least k/k, 
for k > 1/e, the first property in the statement of this theorem holds. 

□ 

Proof of Lemma 1372] By construction, each time we update the convex body K, we reduce its volume by 
a factor of 1 — 8 using C] - %( n )^°&(' l -/ s ) examples. So it suffices to show that we will not do these updates 
more than 0([n(log(n) + log(M)) log(l/e)]/5) times. We note that the volume of K is 1 at the beginning. 
We prove a lower bound on the volume of the final convex body K by showing that some points will not 
be deleted in any of the iterations. We say a value vector v' is close to the actual value vector v, if for any 
1 < i < n, we have that \vi — v[\ < e'. We claim that a vector v' € [0, l] n which is close to v will not be 
removed in any of the loops. We prove by contradiction. 

Suppose v' has been removed by adding some constraint on pair of objects (i, j) with price vector p. 

We should have that ^ < -f. Since we added this constraint, we also should have that Vi ~ e > Vj ~*~ e . 

Vt Pj Pt Pj 

But this is a contradiction since v\ > v i — e' and v'j < Vj + e'. So we never remove points from the set 

(v + [-e',e] n )n[0,l] n . 

We note that for each 1 < i < n, the length of interval [vi — e',Vi + e'] n [0, 1] is at least e', so the 
volume of the set of points (v + [— e', e] n ) n [0, 1]™ is at least (e') n which is a lower bound on the volume K. 
Therefore the number of iterations can not be more than log 1 _ (5 (^y-) = ^^~^y ^ liMVO gy definition 
of e', the number of loops is Q^ n ( lo g( TO )+ 1 °g( A:f )) /o 3(i/':) y § Q tne tota i number of examples we use to learn K 



ig G( - Cnlog(n)(log(n)+log(M))^g(lA)log(l/3) s ) n 
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